Audio Segment Retrieval Using a Synthesized HMM
نویسندگان
چکیده
In this paper, we propose a general approach to audio segment retrieval using synthesized HMMs. The approach allows a user to query audio data of any length by one or more example audio segments and find similar segments. The basic idea of our approach is to first train a theme HMM using the given examples and a general background HMM using all the audio data, and then combine these individual HMMs to form a synthesized “BackgroundTheme-Background” HMM. This synthesized HMM can then be applied to any audio stream as a parser to detect the most likely theme segment. A major advantage of this approach is that it does not assume any predefined segment boundaries as in previous work, thus can be expected to retrieve theme segments with more accurate boundaries. Preliminary experiments with music detection have shown promising results.
منابع مشابه
Precision ��
As one of the key methods to extract content semantics and structure from audio, automatic audio classification, especially for a speech and a music, is valuable for content-based audio retrieval, video summary and retrieval, and spoken document retrieval, etc. Because hidden Markov model (HMM) can well model audio signal’s time statistical properties, a left-right discrete HMM is proposed to c...
متن کاملHMM Model Selection Issues for Soccer Video
There has been a concerted effort from the Video Retrieval community to develop tools that automate the annotation process of Sports video. In this paper, we provide an in-depth investigation into three Hidden Markov Model (HMM) selection approaches. Where HMM, a popular indexing framework, is often applied in a ad hoc manner. We investigate what effect, if any, poor HMM selection can have on f...
متن کاملReducing over-smoothness in HMM-based speech synthesis using exemplar-based voice conversion
Speech synthesis has been applied in many kinds of practical applications. Currently, state-of-the-art speech synthesis uses statistical methods based on hidden Markov model (HMM). Speech synthesized by statistical methods can be considered over-smooth caused by the averaging in statistical processing. In the literature, there have been many studies attempting to solve over-smoothness in speech...
متن کاملMulti-tape finite-state transducer for asynchronous multi-stream pattern recognition with application to speech
In this thesis, we have focused on improving the acoustic modeling of speech recognition systems to increase the overall recognition performance. We formulate a novel multi-stream speech recognition framework using multi-tape finite-state transducers (FSTs). The multi-dimensional input labels of the multi-tape FST transitions specify the acoustic models to be used for the individual feature str...
متن کاملNew approaches to audio-visual segmentation of TV news for automatic topic retrieval
This paper presents two new real-time approaches to segmentation of TV news shows into topics. The goal of this research work is the high precision retrieval of topics from TV news. For that purpose, the detection of correct topic boundaries is of great importance. We introduce a stochastic and a rule-based topic model based on HMMs. The former combines features from the visual as well as from ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003